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ABSTRACT 

This -paper evaluates current practices in curriculua 
design and discusses sone proposals for using efficient Fisherian 
ezperiaental designs to reiedy certain shortcoaings in current 
practices. Two general questions are approached in this paper: (1) 
hov to obtain rational and eipirical evidence that a basic curriculun 
under planning and developsent will do the job for vhich it is 
intended, and <2) how to establish the conditions under vhich a 
curriculuB can be most effectively "installed" in a particular 
classroon to neet the needs of a specific teacher £.ad a specific 
student group. (luthor/HLF) 
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THE DESIGN OF EXPERIMEKTS AND THE DESIGN OF CURRICULUM 

Robert Calfee 
Stanford University 

A curriculum, by dictionary definition, la a course of study. It 
can be a collection of books and other materials. It may be a set of 
teacher manuals, which form the core of many curriculum programs. Curri- 
cula can ranse from scope and sequence charts to detailed wrlteups, from 
general discussion of concepts and strategies to exact presdptlons. 

A curriculum can be viewed as an organized structure for carrying 
out Instruction In some domain. The structure encompasses content (the 
thing to be taught) as well as time (the order In which things are 
taught). The temporal structure is especially germane when we think 
about a curriculum from the student's point of view— a curriculum Is 
something that unfolds over time from one day to the next. 

Curriculum construction Is a complex operation, entailing numerous 
decisions. In the case of some programs, the development process Is 
long. Involved, and expensive— an elementary reading program entails the 
work of dozens of people over years at costs In the millions. At the 
other extreme Is the teacher who creates his own program from day to day, 
using his head and whatever resources are at his disposal. The Inter- 
fl«diate case Is typical— the teacher uses an established curriculum as 
a starting point, modifying it as necessary to meet local needs. 

This paper addresses the typical case. First the paper will cri- 
tically review current practices in curriculum design. Then it will 
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discuss some proposals for using efficient Flshcrlan experimental designs 
to remedy certain manifest shortcomings of current practices. 

There are two general questions that I hope to answer in thl« paper. 
How can we obtain rational and empirical evidence during the planning and 
development of a basic curriculum about whether It does the Job for which it 
intended? How do we establish the conditions under which a curriculum 
can be most effectively "installed" in a particular classroom to meet the 
needs of a specific teacher and student group? 



Planning . At the present time, curriculum programs are created in 
a relatively unsystematic fashion. Th^re is a planning stage, in which 
a range of alternatives is considered. A number of curriculum theorists 
have discussed various ways to approach the task (Tyler, 1949; Taba, 
1962; Klrst £e Walker, 1971). These approaches vary considerably in clari- 
ty, analytic rigor, and practicality. Often the planner simply focuses 
on one o£ two specific ideas that he considers innovative and crucial to 
the success of the program. For Instance, he might think that printing 
vowels in contrastive colors will ensure that beginning readers 
learn the vowel correspondences of the English language. 

Development , In the development . phase, a large number of people 
work together to create the curriculum. Squire (1974) has described this 
interaction frankly, though perhaps too optimistically: 

''How do publishers ensure that the reading materials they publish 
are usable and workable in the classroom? 

"Traditlooally they have relied on just about every Research and 
Development (R&t) resource that has been available to them. 
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They select authors with practical classroom experience and 
faaiiliarity with classroom applications of research. 
They engage experienced and successful writers of literature 
for children, hoping that the writers* demonstrated sensitivity 
to the interests of children will provide a reservoir of 
' insights* useful in writing or chooEiing selections for reading. 
-They rely on the judgment and Insights of professional reading 
editors, the large majority of whom have devoted their careers 
to teaching and education, and the staffs in some publishing 
houses are not too unlike the education faculties in many 
colleges. 

-They depend in initiating new programs on the accumulated back- 
ground studies on previously published programs—the elements 
In programs that worked, the elements that didn't work. It is 
no accident that the majority of publishers who were strong in 
reading twenty years ago continue to be strong today. 

-They build on small-scale ' experimental' projects initiated by 
Individual schools and school systems, attempting to make the 
innovative dimensions of an isolated experiment usable by 
teachers everywhere. 

—They call on professional scholars and successful teachers to 
review manuscripts prior to publication, and today especially 
they call on qualified and sensitive educational leaders to 
consult on probleM of cultural pluralism and sexism in content 
and graphics. 

—They check the readability level, the concept density, the 



Calfee - Design 



8/20/74 



4 



interest level, oi particular nanuscripts prior to publication 
and they check the authenticity of content. 
They ask selected groups of children to read and use materials 
prior to publication to obtain an indication of pupil response. 
"—They organize tryouts of especially critical materials prior 
to publication." 

Much nappens during development. Everyone involved makes day- to- 
day decisions that affect the character and effectiveness of the 
curriculum. Documentation of the growth of the curriculum is sparse 
and unsystematic. Evaluation— of the quality of the program components, 
and of the degree to which each component adequately represents the ori- 
ginal planning criteria— likewise tends to be informal, and the influ- 
ence of evaluation on the developing curriculum is a happenstance matter. 

Numerous decisions have to be made during the development phase. A 
theory of Instruction (Bruner, 1966, Ch. 3)— and a curriculum can be 
viewed as a realization of such a theory-- should guide certain decisions: 

Substantive content What should be taught? 

The sequencing of content In what order should things be taught? 

The method of delivery What materials and format should be 



Provision for individual How to deal with different entering 



used (books, games, pictures, etc.) 



differences 



levels, rates of progress and interests 



how flexible should the program be? 



Assessment 



How should learning be measured? 



Uhat feedback is to be given the 



student? What criteria will be used 



to evaluate progress? 
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As «n aside, it sight bs intsrtsting to reflect on the relative Influence 
on each of these decisions of the various involved individuals and groups-- 
the author, scholar, publisher, parents, coototunity, board of education, 
etc. 

In principle, the original planning concepts should be preserved 
during these nultitudinous decisions. In actuality, the result often 
departs in form and substcmce from the original plan. The nev curriculum 
is then subjected to a series of critical reviews and tryouts, and fur- 
ther changes are made to render it more suitable* This process generally 
yields a product acceptable in a wide variety of conventional classrooms. 
If it is not noticeably more effective than other efforts, or if it fails 
to meet the needs of some teachers and some students- -well, no one is 
perfect. 

Evaluation . Just how correct are the decisions in planning and de- 
velopment? And how effective is the final product? Formal evaluation Is 
made after planning and development. To be sure, there is argument and 
debate, review and critical analysis all along the line. These are often 
dignified by the term "formative evaluation." All too often this term 
means that the evidence is weak and the dpcumentation sparse or non- 
existent. The reliance on empirical data is generally slightest during 
the modeling and fashioning of the curriculum, when significant change 
is still possible. Only after the product is completed and hardened is 
there any effort to determine effectiveness by actual performance. Simna- 
tive evaluation, as this latter activity is known, is eclectic in character. 
The curriculum "as a whole" is evaluated by general measures such as 
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standardized achievcaent tests, which may have little relation to the 
content or purposes that distinguish the new curriculum from Its 
predecessors. 

A number of educators have criticized present practices in curriculum 
evaluation (Scriven, 1967; Cronbach, 1963; Stake, 1966; Wittrock & Wiley, 
1970; Bloom, flftstings & Madaus, 1971). Kirst and Walker (1971), sumnarl- 
zlng a discussion of current practices in curriculum development, conclude 
that "curriculum decisions are not based on quantitative decision tech- 
niques or even on a great deal of objective data (p. 487)." Walker (1973) 
ponders the possibility that "many of us in curriculum have at best a 
comparatively w«ak cocmltment to empirical research as a means of dealing 
with our professional problems (p. 63)." In that paper, he points to 
self-imposed restraints in curriculum research and evaluation that make 
much existing work useless— such as exclusively "behavior 1st Ic" measures 
of performance, and the search for Isolated, "one- thing- at- a- time" 
cause-effect relations. 

Walker and Schaffarzlck (1S74) reexamined <iata from several currl- 
culuD evaluation efforts, separating outcome measures that meshed reason- 
ably well with a given curriculum from those that did not. Their general 
finding was that students do well when tested on the content they have 
studied, and relatively poorly when tested on content they have not 
studied- -unsurprising but reassuring. Their conclusion is: "What these 
studies show, apparently, is not that the new curricula are uniformly 
superior to the old ones, though this may be true, but rather that 
different curricula are associated with different patterns of achleve- 
fflent (p. 97)." This promising though modest conclusion may represent 
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the apogee of current research on curriculum. 

The Centrallty o£ Evaluation 
Improved evaluation is lEundamental to better curriculum develop- 
ment, revision, and Installation practices. We do not lack for Imagina- 
tive, Innovative and effective Ideas about how to teach; at least this 
holds for certain basic subject-matter areas like reading and to some 
extent mathematics. Rather, we lack adequate means for determining which 
ideas are really good, and which ones are just so-so. 

The role of evidence . The collection. Interpretation and weighing 
of evidence should be continuous during the creation of a curriculum 
program, from planning, through development, to the final stage of Instal- 
lation In classrooms. Wherever decisions have to be made, the basis for 
a choice can be subjective, political or empirical. If the latter Is 
possible. It should have priority. 

Evaluation procedures should be directly linked to pertinent questions 
at a Riven stage of development. Expert judgments arc useful evidence In 
many situations; anecdotal classroom observations may be more Informative 
than the quantitative data obtained from standardized achievement tests. 
But It Is Important to apply minimum standards to evaluation no matter 
what context: (a) There should be an empirical basis for the evaluation, 
and the evidence should be of adequate reliability, (b) The evidence 
should be documented and capable of substantiation, (c) Evaluation 
should be based on multiple sources of information. 

Analysis of a problem . Next, consider performance-based evaluation, 
in which behavioral data from students or teachers Is collected as part 
of evaluation. In investigating any complex system, scientific progress 
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often depends upon analysts of a coroplex problem by dividing it into sub- 
problems that can be studied independently. We have relatively few guide- 
lines as to what partitionings are suitable in curriculum research 
(Walker, 1973, p. 68). For Instance, the best way to teach a child to 
solve quadratic equations may depend on whether he learned to add by 
rote flash-card drill or by counting on his fingers, but this seems 
farfetched. A priori judgment may have to guide us in deciding what com- 
ponents are and are not independent until we have a more adequate empiri- 
cal base than at present. Elsewhere Floyd and I have discussed the use 
of multlf actor experimental designs to establish the independence of 
various elements of a curriculum at the same time that evaluative data 

are being collected (Calfee & Floyd, 1972). 

Experimental control in planning . It seems vital to progress in 

curriculum evaluation that experimental control be established over the 
major decision factors in a curriculum plan, and o^er subsidiary factors 
where feasible. Most often a new curriculum represents a single set of 
decisions about content, sequence, method of delivery, individualization 
and assessment. If a new curriculum incorporates a fixed set of deci- 
sions for each component, wa have no way of obtaining evidence about the 
outcome under alternative decisions, A comparison of two curricula in 
which choices are varied In an unsystematic manner is also uninformative, 
because of uncontrolled confoundlngs* 

To see where and how experimental curriculum research might be done, 
let us consider the process of cxirriculum development. The Initial 
phase Involves a structural description of the curriculum to identify and 
label the decisions actually embodied In the curriculum— decisions as to 
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learning goals ^ organization o£ content ^ instructional sequence » and 
necessary entry behaviors for beginning any sequence (Figure 1). 
Thlfi should provide a clear picture of the decision structure under- 
lying the currlcular sequence. In this phase^ the designer thinks 
about the assumptions behind particular teaching methods, content, 
and materials. He assigns priorities to various learning goalSv 

Once certain critical decisions have been identified, alternatives 
that are feasible and worthwhile can be proposed. The thiird phase centers 
on the design of parallel experimental curricultim strands (Figure 2). 
Parallel strands are built around elaboration ct alternative pathways 
at critical decision points, so that exp«»/i:imental variation is intro- 
duced at loci Judged to be of potenlial significance. This analysis 
requires efficient design foi control of multiple factors. 

Evaluation in thu classroom . When a curriculum is tried out under 
real classroroi conditions, it is important to control external factors, 
Bvzli as variations in the reacher, the students, and the school environ- 
ment. An linport&nt question in the evaluation of any new curriculum is 
the degree to w^iich it is effective under the varied conditions that 
arise in real cla^^srooms. Control over external variation requires that 
the researcher identify potentially relevant factors, and that he select 
a sample of schools, teachers, and students in which these factors are 
represented in a design that allows the isolation of effects associated 
with such factors* By incorporating control over external factors in che 
evaluation design, it is possible to measure specific interactions be- 
tween curriculum factors and external factors (this is, in a slightly 
different guise, what is called aptitude- treatment interaction), and 
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FIGURE 1 - Structural Analysis: At the top are blocks representing the 

sequence of components (or lessons) of the existing curriculum. 
One block might represent a module in which the child is taught 
a set of "sight" words, or caught the principle of adding two- 
digit numbers with a carry. Below, these components are placed 
in a decision network illustrating a hypothetical set of choices 
made by the curriculum planner. Thus, la, lb, and Ic along with 
1 were all candidates for the first component. In choosing 1, 
th6 planner made a decision which limited the available components 
for the next stage to 2, 2b, and 2c. The most significant 
decisions, as Identified by the designer and other experts, are 
indicated by dotted lines. These serve as the basis for construct- 
ing alternative curriculum strands for experimental evaluation. 
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FIGURE 2 > Design of Alternative Curriculum Strands: Four strands are 
designated in this exr^^ple. If we assume that mastery has 
been achieved at each dtage, then all students at a given 
node can be Independently assigned to the nodes leading to 
the next level. This possibility prevents the design from 
increasing exponentially. Hajor interactions thought to be 
Important can be investigated, and others are eliminated by 
default. Ideally, the sequential patterns are less Intricate 
than in this example, so that the modules at each stags would 
form an Independent experimental design structure. 
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to estlniate the magnitude of generalized Interactions (for example, 
cur rlculum«by- school vaiiablllty). If the specific Interactions are 
large, this calls for alternate versions of the curriculum, and speci- 
fication of the conditions under which one or another version is most 
effective. If generalized Interactions are large, then the "trans- 
portability" of the decision is altogether questionable. 

The First Grade Cooperative Reading Study (Bond & Dykstra, 1967) 
exemplifies the problem. Six different reading programs were compared, 
each tested by several project teams in a number of schools. Variability 
betT->>.en schools within a project, and variability within projects within a 
program, were both as substantial as between-program variability. In 
short, the progrsm distinctions were not significantly related to per- 
formance outcomes. 

Once more, the establishment of suitable evaluation conditions rests 
on the adequacy of the experimental design. 

Measurement and evaluation . Next, there is the task of constructing 
an appropriate measurement system. I will focus my remarks on student 
performance measures, but the same considerations apply to teacher measures, 
classroom measures, and any other source of information about curriculum 
effectiveness (e.g.. Judgments from expert observers. Including curriculum 
specialists, anthropologists, etc.) 

A measurement system should rest on an analysis of the essential 
component processes or elements in learning. In social studies, for 
Instance, suppose that we were to identify as significant learning com- 
ponents (a) a method for selecting relevant historical facts from a 
passage, (b) techniques for organizing and memorizing such facts, (c) a 

« 
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body of knowledge (historical facts) of special importance, (d) procedures 
for critical analysis of a source of historical inforaaticn, and (e) 
techniques for comparing and contrasting two sets of historical data. 
These are not the only important elamcnts in a social studies curriculum, 
but they are a reasonable starting point. These are cognitive skills— 
not behavioral objectives. They are ways of thinking and solving pro- 
blems; they axe Imowledge. They subsume sets of specific behavioral 
objectives. 

Of potential relevance to the analysis of component skills is current 
research on information processing (Sternberg, 1969; Anderson, 1970; 
Kavanaugh & Mattingly, 1972; Lindsay & Norman, 1972; Chase, 1973; 
Haber & Hershcnson, 1973, Ch. 7). Information-processing models take 
the form of sequentially or hierarchically organized structures of cogni- 
tive processes. After postulating a structural model for a given task, 
the psychologist identifies the specific processes and factors affecting 
each, and then formulates experiments to obtain evidence about the func- 
tional independence and operation of these processes. Sternberg proposed 
a simple and elegant paradigm for tackling this problem in which a factor- 
ial design is built around various combinations of within- and between- 
stage factors. If the independent-process analysis is correct, perfor- 
mance in a given stage will depend only on variation in factors associ- 
ated with that stage; factors associated with other stages will not affect 
the performance of this stage either directly or by way of interactions. 
(For an extension of this technique, cf. Calfee, 1970» 1974; for a different 
approach to the same problem, see Carroll, 1974.) 

In the social studies example, this approach would require that we 
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answer two questions for each of the five components: What experimental 
variables are likely to influence this component directly? How can the 
operation of the component be measured? For instance, in component (b), 
techniques for organizing and maoaorizing facts, a relevant factor might 
be the method by which a student is taught to organize and memorize. One 
method might be to arrange the facts in a hierarchical structure and 
teach by rote repetition; another might be to organize the information in 
any way and use mnemonic techniques such as the method of loci for memori* 
zatlon. Measurements of this process might Include asking the student 
what he was doing, examining the organizational character of the proto- 
cols, or measuring total recall of a body of historical facts, either 
lamediately or after a delay. If memory is a process in the acquisition 
of social studies, a factorial design including a range of variables 
should reveal that recall is affected only by memory factors, and not 
by factors affecting other components. 

From the perspective of the cognitive psychologist, asses ment 
batteries created in this fashion are factorial experiments. From the 
perspective of the educational psychologist or curriculxim evaluator, 
these batteries can be viewed as tests or assessment instruments. The 
data can be ex^ned in a straightforward way to answer questions about 
the relative Importance of each factor, and about sources of substantial 
Individual differences. There is no need to resort to factor analysis, 
or attanpt to construct "factor-pure" tests. The experimental design 
is self- confirming and self-correcting with regard to the validity of the under 
lying process model. 
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Fisherlan Experimental Designs in Curriculum Planning and Evaluation 
At each of the points touched on above, one requirement for esta- 
blishment of adequate control was the application of experlmeni^al 
design procedures* The past three decades have seen the widespread 
acceptance of factorial designs In psychology and education. Hierarchi- 
cal designs are now commonplace, and Latin and Graeco-Latln s<iuares 
are In frequent use for control of nuisance factors. There Is Increasing 
sophistication in the statistical and interpretive analysis of such 
designs, especially with regard to questions of generalization to various 
populations (Cronbach, Gleser, Nanda £t Rajaratnam, 1972)* 

The approach has its detractors (cf. Stuff lebeam, 1971, for a review 
of this issue). Some argue that experimental control over school-related 
research is impractical or unnecessary, and that those factors over which 
control can be maintained are likely to be trivial. Others confuse 
design with analysis, and promote multiple regression as preferable to 
analysis of variance techniques. Sometimes one procedure will do a 
better Job, sometimes the other, but they are based on the same under- 
lying model, and used properly both techniques ordinarily give a similar 
answer. 

The following points about Fisherlan designs and analysis of vari- 
ance bear specifically on the evaluation of curriculum and instructional 
programs: 

(a) The model for Fisherlan designs, the general linear model, pro- 
vides a simple and elegant model where theory is vague, misleading 
or altogether lacking. 

(b) The a priori arrangement of factors into orthogonal structures 

18 ■ 
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substantially Increases the sensitivity of a curriculum experiment 
to questions of Interest, compared to naturalistic research. With 
a priori control, factors are likely to be partly or fully confounded, 
and while techniques exist for a posteriori "adjustment" of data, 
none of these has the power of a balanced, orthogonal design, 
(c) Fractional designs, a natural extension of full factorial 
designs, provide all the advantages of orthogonality, but per- 
mit a relatively small amouni: of data to be used to answer a large 
number of questions. 

The general linear model . In Its most general form, the linear 
model Is 

^- ^O + Vl* ^2^2 Vn'*•^ 
where Y Is a criterion measure, the are factors to be used In pre- 
dicting Y, 0^ Is the weight of factor I, and and e are baseline and 
error parameters, respectivo-ly (Morrison, 1967; Rao, 1965; Scheff^, 
1959). The model can be formulated for the multivariate as well as the 
univariate case. It Is the basis for analysis of variance and multiple 
regression analysis, both of which are flexible and robust techniques. 

The power of the linear model as a substantive model Is often over- 
looked (Suppes, 1974). Current applications emphasize tests of the null 
hypothesis, but the machinery exists for more explicit tests of parameter 
valxies and for measuring the relative Influence of factors in a set by 
exaoilning components of variance. 

The linear model also provides a readymade system for handling 
several other problems: What measurement scale provides the most par- 
•iaonlous description of a set of data? What is the magnitude of 
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Interactlons among a set of basic predictor factors? What Is the magni- 
tude of confoundlngs among predictor factors? 

In short, the linear model stands ready to serve the needs of curri- 
culum research. It is able to handle the complexities of this area in 
a flexible and statistically powerful fashion. Theory and techniques of 
analysis have been worked out in great detail. 

A priori design and a posteriori analysis . Flsherlan designs require 
strong control over the selection of experimental units and the assignment 
of tre?itments to units. These requirements weigh heavily on the shoulders 
of thoue of us who try to do educational research. Schools are distrust- 
ful and cynical about the value of research, and protective of themselves. 
It is easier to find some school that will let you carry out research 
than to try to gain entry to the school called for by the design. It 
is easier if a teacher volunteers for a treatment than to arrange for 
a teacher to follow a treatment according to a design. The path of least 
resistance for the investigator is to simply look around for preexisting 
treatment-xinlt combinations, and then to see what design he has. 

Uhat is wrong with the path of least resistance? The answer lies 
in the presence of substantially confoxinded factors, with consequent 
lack of control. Not everyone agrees that a priori design control is 
important. Cohen (1970) has proposed techniques for "naturalistic" 
design. A posteriori adjustment of confounded data is possible under 
certain highly restrictive assumptions (Elashoff, 1969). And the possi- 
bility of causal inference from correlational data has been considered 
(Wittrock and Wiley, 1970). However, a posteriori techniques are much 
weaker statistically than comparable a priori designs. They rest on 
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strong assumptions, frequently untenable. And they are of little use to 
the currlculim planner and developer, because the natural process of 
curriculum development provides little "natural*' variation of the sort 
needed for empirical testing of specific hypotheses. 

Fractional Designs and Efficient Multi factor Experiments 

Educational and experimental psychologists usually think of a fac- 
torial design as a between- subjects arrangement consisting of all com- 
binations of two or three factors, with 10 or more subjects per combina- 
tion. "As everyone knows," it is essential to have a large number of 
subjects per combination, because if the sample size is too small then 
the statistical test will be weakened, • In fact, this "sample- size-per- 
cell" requirement is based on misunderstandings about what is being 
tested in factorial designs. 

The high cost of this paradigm severely restricts the experimenter's 
ability to investigate complex problems. With three two-level factors, 
there are 2 x 2 x 2 or eight combinations, which at 10 subjects per com- 
bination amounts to 80 subjects. The increase is geometric with addi* 
tional factors and levels per factor. Designs with more than five factors 
are impractical because of the cost— ev6n if the subjects are students. 
If they are teachers or programs, the constraints are even tighter, and 
designs with one or two factors are about the limit. 

A contrasting paradigm is the within- subject design, now coomon in 
behavioral experiments. When each subject Is tested under several fac- 
torial coobinatlons the statistical tests are quite sensitive. Usually 
the subject Is tested only once under each combination, and the "sample 
size" requirement Is conveniently ignored. Control for order and 
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naterlals variables Is usually achieved with a Latin square or Gracco- 
Latin square. The analysis o£ correlated data from such designs raises 
problems, but techniques for handling these are being continually 
refined. All In all, wlthin-unit designs constitute a powerful method- 
ology for studying certain behavioral questions. 

Even here, there are practical limits to the number of factors that 
can be examined, as long as full factorial designs are employed. A 
design with six two- level factors has 64 coinblnatlons. If each treatment . 
combination takes a minute or more, the "50-mlnute hour" limit typical 
of much psychological research is violated. And in curriculum research, 
we may be talking about treatment combinations that require days, weeks 
or months to administer. 

Fractional designs provide an efficient alternative to full factorial 
design. These designs are not new; the basic procedures have been avail- 
able for at least 40 years, and are described in a number of standard 
texts (Kirk, 1969; Winer, 1971) . For some reason, they have seen little 
use in the behavior. il sciences > except for the special case of Latin 
Squares. 

Two related design procedures comprise fractional designs: frac - 
tional- factorial and confounded- blocks designs. In a fractional- 
factorial design, the experimenter selects a balanced fraction of cells 
from the full design. In a confounded-blocks design the full design is 
divided into orthogonal blocks, each of which is assigned to a different 
experimental unit. Hany applications involve combining these two design 
techniques; a 'fraction of the full design is selected and then broken 
into blocks, each of which is assigned to a different unit. 
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The basic concepts of fractionsl- factorial and con founded- blocks 
designs with two- level factors will be illustrated by an example in 
vhich there are three two- level factors, as shown in Figure 3A. In a 
2^ design, eight degrees of freedom are available for estimating the 
grand mean, main effects, and interactions, each with one degree of free- 
dom* 

The two sets of cells in the design in Figure 3A labeled 4- and - 
represent the two halves of the full design defined by the ABC inter- 
action. The ABC interaction, which in this example has been used to 
divide the full 2^ design into two balanced chunks, is called the defining 
contrast. Consider the consequences of carrying out an experiment using 
only the -f cells of the design. There is one degree of freedom for 
estimating the grand mean, and three degrees of freedom for estimating 
treatment effects. The ABC effect in this fractional design is the same 
as the grand mean; hence, information on ABC is lost. Furthermore, the 
estimates of the following pairs of effects are also identical and hence 
confounded: 

A « BC 
B « AC 
C - AB 

TwD confounded effects such as A and BC are referred to as aliased . 
Figure 3B shows the analysis of variance source table for this 
design. Each source is redefined in terms of the aliasing patterns, 
using ABC as the defining contrast. The cost of cutting the full design 
in half is that information about interactions is lost, and so the experi- 
menter must think seriously about what hypotheses are realiy worthy of 



IHii 

FIGURE 3A - A full 2 deslgP, with three treatment fuctora A, 
B, and C each at two levels, 0 and 1. Cells con- 
taining -f's and -'s represent the two levels of 
the ABC intferaction. 
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FIGURE 3B - AMOVA source table for a 
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Ss-replicate of a 2 design, 
with ABC as the defining 
contrast . 
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FIGURE 3C - ANOVA source table for the 
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analysis of a 2 design run 
in two blocks. Each level 
of the blocks factor, X, was 
assigned to a different experi 
mental unit, and is therefore 
a between units factor. 
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invGAtlgatlon. 

Next let us look at a con founded- blocks design based on the same 
2^ design. In this case, the ABC Interaction Is used to split the full 
design Into two fractions or blocks, the +'s and -'s of Figure 3A, 
each of which Is assigned to a different experimental unit or subject. 
There is one degree ot freedom ior the grand mean, and seven degrees 
of freedom available for estimating treatment effects. Ue have In 
effect created a new dummy variable for blocks, designated by X, each 
level of which Is associated with one level of the ABC Interaction; 
Xq Is assigned to the -(-'s In Figure 3 and \\ to the -'s. Since each 
level of this variable has been assigned to a different experimental 
unit. It constitutes a between-unlts effect. The aliasing patterns for 
the remaining factors, all wlthln-unlts effects, are: 

A « BCX AB " CX 

B « ACX AC « BX 

C « ABX BC •* AX 

The analysis of variance for the confounded-blocks design Is shown In 
Figure 3C. 

A Currlculim> Experiment In Beginning Reading 
Our first example looks at the process of planning a curriculum 
experiment In beginning reading. Suppose that the curriculum can be 
represented in modular form. Many reading curricula are now constructed 
la this fashion. In the sense that within each lesson there are sub- 
jections dealing with specific tasks (Figure 4). We want tp focus on 
three currlculxm components: content , materials and format , and manage - 
ment system . The curriculum Is designed around a set of texts, the 
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target population is first graders, and the curriculum is supposed to 
meet the needs of a variety of teachers and students. 

A preliminary revision of an existing curriculum has been planned, 
and the purpose of the experiment is to provide information about the 
merits of various elements In the revision. There are 16 two-week seg- 
ments In the curriculum, and the curriculum developer thinks It feasible 
to create as many as 16 variations on the basic curriculum program, and 
to carry out the experiment in 32 classrooms. 

Preliminary discussions have Identified the following major questions 

(A) Content decisions: Basic reading Instruction 

a) What Is the value of a relatively strong emphasis on 
phcnlcs /decoding skills, versus a relatively strong empha- 
sis on comprchenslon/"readlng for meaning?" The planner's 
Intention Is to Incorporate both components In the curri- 
culum, but he would like some information on the degree to 
which teachers make use of the two types of materials, 
and the amount of learning and student acceptance of these 
two types of materials at different times in the school 
year. 

b) Within the two levels of the preceding question two sub- 
questions are nested: 

(1) Is phonics most effectively presented by a rule orien- 
tation based on learning letter-sound associations 
and blending procedures, or by a word- based orienta- 
tion a la Bloomfleld and Barnhart (1961)7 

(2) How Important Is vocabulary control? Is reading for 
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meanlng better taught with high-frequency words 
that are .likely to be Irregularly spelled versus 
lesa frequent words that are regularly spelleJ? 

(B) Content: Skill development 

a) Does it make a difference whether or not visual- skills 
work sheets and other similar materials are included 
in a module? 

b) Ditto for auditory- ski lis materials? 

(C) Content: Literature 

a) Does it matter whether or not materials for story- telling 
and poetry are included in a module? 

b) Ditto for creative dramatics, writing, etc.? 

(D) Materials/Format 

a) Does it make a difference whether or not student work- 
books are included in addition to the basic textbook 
materials? 

b) Ditto records, audio tapes » films> etc.? 

c) Ditto supplementary materials especially designed for very 
fast and/or very slow learners? 

(E) Management 

a) Does it matter whether a module is constructed around a 
learning-to-mastery emphasis, as opposed to a minimal- 
competence or remedial model? 

b) Does it matter whether or not an assessment system is 
^provided? 

c) Ditto a record- keeping system? 
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The questions in (A) are of fundamental importance to the construc- 
tion and refinement of curricular content in the final version of the 
program. The questions in (B) through (E) are 4ll yes-no questions. 
These relate to the tendency to throw everything, including the kitchen 
sink, into current curricula, a smorgasbord approach. This costs money, 
makes It more difficult for a teacher to identify useful components, 
and is of uncertain benefit. The answers to (B) through (D) will pro- 
vide evidence on which auxiliary components are worthwhile additions to 
the basic curriculum. 

The preceding questions comprise a set of twelve two- level factors. 
To create a particular Instructional module, we would have to consider 
twelve decisions, each of which might be made in either of two ways. 
For instance, one module might have (A. a) a strong phonics emphasis, 
(A.b.l) a rule-orlentatlon, (A.b.2) no visual- skill materials, but 
<B.a) auditory- skill materials, (B.b) story-telling materials, and 
(C.a) creative dramatics materials, (C.b) no workbooks, (D.a) no audio 
tapes and (D.b) no supplementary materials for fast/slow students, but 
(E.a) a learning- to-mastery emphasis with both (E.b) an assessment sys- 
tem and (E.c) a record- keeping system. 

Besides the twelve, planning or treatment factors described above, 
it is Important to control ordejr . the time in the school year when a 
given module is presented. Assume that the school year is divided into 
four chunks, and that the order of module presentation is balanced 
within and across chunks. Assume further that each child goes through 
16 (« 2^) modules during the school year, and so the full design com- 
prises 2^^ combinations. Sixteen balanced versions or blocks of the 

* 
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baslc curriculum arc to be constructed, and 80 a 2^ fraction of the full 



design is required. 
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This design can be planned in such a way that within-class tests of 
each of the main effects are possible, as well as selected interactions. 
(In fact, only seven of the 120 two-way interactions are not measurable.) 
For instance, it might be useful to ask about the effectiveness of phonics 
versus meaning early in the school year compared to later in the year. 
The relation of assessment and record- keeping materials to mastery 
learning poses another interesting interaction question. The point is 
that one can handle this complex of problems in a relatively sensitive 
design, with adequate control over the entire set of factors, including 
order, at a cost that isl feasible. 

It was assumed earlier that a set of 16 variant curriculum programs 
was to be installed in 32 classrooms. Each classroom will have students 
who vary in entering ability level, sex, and othier pertinent factors. 
It would make sense to use such student information in the analysis. 
This would permit an especially strong attack on what has been referred 
to as the aptitude- treatment interaction hypothesis (Cronbach & Snow, 
1969). 

An even more interesting possibility presents itself. Suppose we 
w/>ated to find out how the effect of curriculum decision depends on 
preexisting characteristics of the school and teacher. We would 
need to plan a between-class design that provided control over 
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relevant factors that differentiate schools and teachers. For example i 

(A) School characteristics 

(a) Urban/ suburban 

(b) High/ low socio-economic neighborhood 

(c) Self-contained/open-school plan 

(B) Teacher characteristics 

(a) Experienced/beginning 

(b) prefers to follow curriculum and teacher manual closely/ 
prefers to adapt curriculum to own program. 

(c) Prefers large group instruction/small-group, independent 
work. 

These six two- level factors, together with the four block factors, con- 
stitute a 2^° design. By planning a 2^ fraction for the 32 teachers 
available, control is maintained over school and teacher factors and the 
assignment of curriculum factors to classes. The main effects of school 
and teacher factors are all testable. Equally importantly, it is possible 
to test hypotheses about interactions between school-teacher factors and 
curriculum factors. For instance, what is the effect of a learning-to- 
nastery component for teachers who prefer large group instruction, com- 
pared to those who adopt a more individualistic approach? 

The major ooint here is that it is feasible to plan designs that 
handle the complexities that arise in curriculum planning an d develop- 
ment, and that achieve the rigor of control deemed necessa rv in behavior- 
al eacperiments . 

The experiment described above constitutes a broad framework within 
which another level of experimental questions could be planned. For 
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Inatance, «rlthln a phonics module one could raise questions about (a) 
the order in which specific letter- sound correspondences are pre- 
sented, (b) the rate at which new correspondences are Introduced, or 
(c) maintenance of constancy or variability in vowel patterns (e.g., 
are short vowel patterns presented first followed by long vowels, or 
are both long and short vowels presented contrastively in a single 
session?). By building successively more detailed designs, the experl^ 
menter can create a hierarchy of experiments, within which might be cm* 
bedded experiments as precise as those now conducted in experimental 
psychology laboratories. In the larger context of the entire study, 
such studies might achieve a degree of relevance and generalizability 
that they now lack. 

An Experimental Study of a Curriculum for Increasing Teacher Effectivcn 
As a second example, consider an experiment designed as a part of 
study of teaching at the elementary level. ^ The "subjects'* are class- 
room teachers. The curriculiim is a series of modular units, each of 
which focuses on a single teaching skill area. There is special interc 
in the most efficient means of "delivering the message." Training must 
relatively fast, and acceptable to the majority of the teachers. 

The research design to be presented can be thought of as a combina- 
tion of experimental and case-study methodology. Each teacher is to be 
studied over a full school year. At intervals, the teacher is trained 
on a specific instructional skill. Classroom observation provides the 
major data on the effects of each training procedure. Because of the 
intensive nature of training and observation, only a small number of 
teachers can be studied. The design to be presented is intended to be 
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Illustrative; alternatives to these particular design factors could be 
(and irould be) given serious consideration. 

The primary question is: What is the effect on classroan teaching 
of short-term, intensive training of teachers on specific classroom prac- 
tices? For example, auppose that more effective teachers provide differ- 
entiated, task- specific feedback to students, and also carry out con- 
tinuous assessment of student progress. If teachers are given training 
on each of these skills, which ones provide immediate payoff as measured 
by a noticeable change in classroom practice? 

Second, what is the relative effectiveness of different methods of 
training teachers in effective classroom teaching practices? For any 
skill in which training is needed, several approaches can ba used--tra- 
ditional Inservice methods, demonstration classes, audiovisual and tele- 
vision equipment, among others. Variation in the "delivery system" will 
provide a test of the relative effectiveness of different training proce- 
dures. 

Let us assume that a maximum of sixteen teachers can be studied, and 
that four training modules are to be administered to each teacher during 
the school year. There are two basic steps in preparing the experimental 
design: First, deciding what factors to use in selecting the sample of 
teachers and schools, and, second, deciding what factors are important to 
the substantive content of the training modules and the method of delivery. 
These will be referred to as the between- teacher and within- teacher plans, 
respectively. 

Here are a set of illustrative between- teacher factors for this study: 
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School Factors 

A. Socioeconomic status of neighborhood served by school 

0 above median 

1 below median 

B. Area served by school 

0 urban, high density 

1 suburban, small town, riiral 

C. Administrative climate and control 

0 high control by principal of instructional program 

1 low control by principal of instructional program (lalssez 
falre) 

Teacher Factors 

D. Grade 

0 Primary 

1 Elementary 

E. Teachlnf^ Style 

0 relatively structured Instructional practices 

1 relatively unstructured instructional practices 

F. Student outcomes In previous years 

0 negligible difference between actual and predicted gain 

1 large positive difference between actual and predicted gain 
Other factors might be considered as serious candidates; the pro- 
blem is at least this complicated, and maybe more so. I^.t us take this 
as a starting point, and suppose that our task is to plan a study with 
the six factors above for a group of sixteen teachers. The full design 
calls for 2^ or 64 teachers. We can include 16 in the design, and so a 

m 
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2^ or \ fraction must be selected from the full design. 

Figure A shows a plan for a one-quarter replicate of a 2^ design 
based on the school and teacher factors described earlier. We start 
with the full 2^ or 64-cell design. Two high-order interactions, ABCD 
and CDEF, are used to divide the full design into four balanced sets. 
In the upper half of each cell is a + or - indicating whether that par- 
ticular cell is positive or negative in the ABCD interaction. In the 
lower half of each cell is a + or - for the two halves of the CDEF inter- 
action. Each of these interactions divides the full design into two 
halves, both of which are balanced with respect to each other. The two 
interactions taken together divide the full design into four pieces, 
all four pieces balanced with regard to each other. 

Each quarter is represented by a + or - for ABCD, and a + or - for 
CDEF. Note in the figure that there are exactly 16 instances of each of 
the four patterns, (-H-), (+-), (-+) and (-). One of these quarters, 
the one with (— ) in each cell, was selected at random for this experi- 
ment. Any one of the four quarters would do equally well. These six- 
teen cells are outlined in the figure. You can see the symmetry from one 
quadrant to the next which reflects the balancing. 

The design requires two t sachers in each of eight schools. For in- 
stance, a low income, urban, high administrative control school will be 
selected in which a primary teacher is teaching in a relatively struc- 
tured fashion, with above average gain in student performance; an elemen- 
tary teacher will also be selected who uses a relatively unstructured 
approach to instruction, and whose students are also relatively higher in 
performance than predicted. 

34 
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Figure 4 

Plan of 2^ experiment, shoving how ABCD CDEF 
Interaction devlde 64 cells Into quarters* the 

quarter is shown in bold aa a suitable fraction 
for conducting an experiment. 
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In this study, we are mainly Interested in the main effects of the 
school and teacher variables, and except for a few two-way sources. Inter- 
actions can be disregarded. The primary purpose of this experiment is 
not to examine differences between teachers. We need control over factors 
associated with differences between teachers, and would like to know the 
relative magnitude of the main effects of these factors. But the chief 
purpose of the study is to examine the effectiveness of training programs, 
and to see whether or not training programs are differentially effective 
as a ftinction of teacher variables. This is accomplished by the within- 
teacher portion of the design discussed later. 

Here are the sources in the analysis of variance that can be tested 
with this design, and the hypotheses corresponding to each source: 

Question: Is the effectiveness of 
the training program dif- 
Source ferent for.., 

A Socioeconomic status teachers working In above/below median 

Income schools? 

B Area teachers in urban/ suburban schools? 

C Administrative control teachers in a school in which prlnci- 

pal.8 exert considerable influence on 
the instructional programs /schools 
in which principals exert little con- • 
trol? 

D Grade primary/elementary teachers? 

E Teaching style teachers who employ more/ less struc- 

ture in instruction? 
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F Student outcome teachers whose students have done rela* 

tively well/teachers whose students 
have done about as predicted? 

AB teacher In urban/ suburban schools » 

depending on whether neighborhood 
socioeconomic level Is above/below 
liu^dlan? 

CE teachers In ^•hlgh'7"low-control" 

schools depending on their teaching 
style? 

BF teachers in urban/ suburban schools , 

depending on whether the relative 
student performance is above predicted/ 
about as predicted? 
The Interactions selected for hypothesis testing in this design are for 
illustration only« As many as two or three interactions could be selected 
for testing, and there would still be seven or eight degrees of freedom 
for an error variance term» sufficient to estimate the error variance 
for this portion of the design. 

In the within- teacher portion of the design^ three different sets 
of factors are being proposed for variation across the four training 
modules for each teacher* These include the area of training, the method 
of training, and the time when training is administered. 
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Aera of Tra ining » • 

(G,H) 0 apeciflc, differentiated feedback 

1 uae of perforaance-based evidence 
(as opposed to opinion) in assessment 

2 procedures for continuous monitoring in 
reading and mathematics 

3 how to keep records and use them for individualization 
Methods of Training 

(J,K) 0 traditional in-service training 

1 demonstration classrooms 

2 television demonstration, micro teaching, etc. 

3 uae of school resource personnel to Install and 
support the program 

Time of the Year 
(L,M) 0 October 

1 January 

2 Mid-February 

3 Early April 

The four-level factors are each represented by two two-level factors for 
convenience in planning the experimental design. 

In the within- teacher portion of the design, we must solve the probelm 
of fitting a 2^ « 64 cell design to the constraints that each of the 16 
teachers receives four training modules, each of which comprises one cell 
of the design. The task of planning is sufficiently complex that plans pro- 
vided by the Mbtional Bureau of Standards (1957) were used. These plans 
provide the details of how to organize fractional-factorial and confounded- 
blocks experiments with as few as five factors and as many as 16, for a 
irida range of fractional and confounde<l3USocks 
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constralnts. 

Eight blocks of four cells each are to be selected In a balanced 
fashion from the full 2^ or 64 cell design. This is done by first 
dividing the design into two halves, as shown in Figure 5, using the 
6-way interaction as a contrast. One of the halves, the (-) portion 
in this example, is then divided into eight blocks of four cells each 
in a balanced fashion. A block comprises a balanced ordering of four 
continations of area of training, method of training, and time of train- 
ing, which can be assigned to two teachers in one of the eight schools. 
The block numbers are shown in the figure. In Figure 6, the design has 
been written out in a different way to show the sequence and coinbina- 
tions of training conditions in each of the eight blocks. 

Each of the eight blocks is assigned to one of the schools, and so 
each pair of teachers goes through a unique training sequence. The 
design allows all of the main hypotheses of interest to be tested. 
These are shown below: 

Withia-Teacher Analysis 

Question: What Is the effect on 
teacher practices of 
S ource training 

... in different areas such as how 
to use differentiated feedback, 
continuous monitoring, etc. 

. . . using different methods of 
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Figure 3 



Within - teacher design: Each of the 4-level factors is described by 
two 2-level factors. The highest order interaction io used to divide 
the 6A cells into two balanced halves, - and -f* The - half is then 
divided into eight blocks of four cells each. Each block describes 
tha training sequence for one school* 
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Figure 6 

Rearrangement of within— teacher plan to show sequence in which area/ 
method combinations occur in each block. Each school in the between- 
teacher design is assigned one of the blocks. 
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presentation such as tra- 
ditional in-service training, 
television equipment, etc? 

. . . in different areas, when dif- 
ferent training methods are 
used; is there any evidence 
that some areas require cer- 
tain training methods? 

. . . at different times in the 
school year? 

. . . in different areas, for teach- 
ers vhose students performed 
about average versus those 
whose students did better 
than expected? 

. . . in different areas, for teach- 
ers with mote structured ver- 
sus less struc^red programs? 

. . .with different methods, for 
teachers whose students per- 
formed as expected or those 
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whose students did better than 
expected? 

J, K by E 

(d£ « 3) ... with different methods , for 

teachers with more structured 
versus less structured pro- 
grams? 

These hypotheses require 27 of the 48 degrees of freedom available, 
leaving 21 degrees of freedom for an estimate of residual error variance. 
As can be seen, numerous Interactions of Interest can be tested with 
this design. Most of the hypotheses can be profitably broken down into 
more precise questions In which specific areas and methods are compared. 
This would be a more powerful analysis than the omnibus questions pre- 
sented in the table. 

The methodology used in these experiments differs in several ways 
from that used in most traditional educational research. The experi- 
mentally controlled variations proposed here are designed to compare 
the effectiveness of several plausible alternative methods of training 
and different training practices. Most traditional experiments have 
cosqtared an "experimental" treatment to a no-treatment control or a 
"business -as-usual" control. In the studies proposed here, variations 
in the targeted training areas and in the methods of training are de- 
signed to Isolate the effects of specific teaching skills and of train- 
ing methods used to promote acquisition of these skills. 

The degree of control achieved by these designs is impressive. 
Differences between teachers can be handled by the dei^lgn virtually to 
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Che practical limit of our knowledge of factors affecting teacher effec- 
tiveness. The major hypotheses about treatment effects are all wi thin- 
teacher questions. Each teacher serves as his own control, which allows 
highly sensitive tests of the treatment variations. Finally, notice the 
implicit assumption that these areas of classroom practice are largely 
Independent of each other. Ve are assuming that a teacher's effective- 
ness can be increased by learning skill A, regardless of whether some 
other skill B has been adopted or not. Undoubtedly there are A's and 
B*s which are not independent, but as a starting a8Suaq>tion for design- 
ing experiments, independence has the advantage of simplicity. Accord- 
ing to this assun^tion, the effectiveness of a teacher in a classroom 
is a simple additive combination of the number of skill areas in which 
the teacher is proficient. 

In Closing 

These proposals offer an alternative to present practices in curri- 
culum research and evaluation, an alternative that is workable and may 
have considerable promise. It would reduce, if not eliminate, the dis- 
tinction between formative and summative evaluation, a distinction which 
has often justified sloppy research during the formative stages of 
curriculum development, and largely irrelevant evaluation of the final 
product. 

The cost of applying experimental design techniques to curriculum 
research is probably not nuch more than is being spent on curriculum 
evaluation in macy federally sponsored labs and centers. What is re- 
quired is a more active and analytic Job of thinking by researchers 
during th -..rly stages of curriculum development. Rather than waiting 



« 



Calfee - Design S/IO/U -3o- 

until the curriculum is finished and then wheeling out a battery of 
standardized tests, researchers would have to roll up their sleeves 
and work with developers during creation of a curriculum. The production 
of variant forms of a curriculum program to fit a design structure, the 
installation of these forms in carefully selected classrooms, the con- 
tinuous assessment of those pilot vexsions through observation and 
testing, all entail the replacement of current trial and error proce- 
dures with a more systematic approach. The cost in dollars of using 
experimental designs would be relatively modest; the cost measured in 
careful thinking, precise impositions and systematic measurement would 
be considerable. 

The benefits seem obvious. At worst, we would obtain trustworthy 
evidence to support the claims of skeptics that it really doesn't matter 
very much what goes on in tlie schools. The more optimistic hope is that 
by shedding light on the complex set of factors that make up an instruc- 
tional program, we would see more clearly the differences between those 
practices that promote learning and those that do not. 
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Footnotes 

1. Paper presented to the Curriculum Sjnnposlum at the University 
of Delaware. This research was sponsored In part by a grant from the 
Carnegie Corporation of New York. I am grateful to Adrian Sanford and 
Annalee Elman for their comments, and to Jana Floyd, Frederick McDonald, 
Patricia Ellas, and Kathryn Hoover for helpful discussion on this topic. 

2. This example springs from a collaborative project with Frederick 
McDonald at Educational Testing Service. 
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